Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study

您所在的位置：网站首页 › dw method › Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study

Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study

2023-05-07 02:34| 来源: 网络整理| 查看: 265

Interrupted time series (ITS): model and estimation methods

We begin by describing the statistical model and parameters used in our simulation study followed by a brief description of some common statistical estimation methods and the Durbin-Watson test for autocorrelation.

Statistical model

We use a segmented linear regression model with a single interruption, which can be written using the parameterisation proposed by Huitema and McKean [18, 19] as:

$$ {Y}_t={\beta}_0+{\beta}_1t+{\beta}_2{D}_t+{\beta}_3\left[t-{T}_I\right]{D}_t+{\varepsilon}_t $$ (1)

where Yt represents the continuous outcome at time point t of N time points. Dt is an indicator variable that represents the post-interruption interval (i.e. Dt = 1 (t ≥ TI) where TI represents the time of the interruption). The model parameters, β0, β1, β2 and β3 represent the intercept (e.g. baseline rate), slope in the pre-interruption interval, the change in level and the change in slope, respectively. The error term, εt, represents deviations from the fitted model, which are constructed as:

$$ {\varepsilon}_t={\rho \varepsilon}_{t-1}+{w}_t $$ (2)

where wt represents “white noise” that is normally distributed wt~N(0, σ2), and ρ is the lag-1 autocorrelation of the errors which can range from −1 to + 1. A lag-1 error means that the influence of errors on the current error is restricted to the value immediately prior. Longer lags are possible but in this paper we confine attention to lag-1 only (AR (1) errors).

Estimation methods

A range of statistical estimation methods are available for estimating the model parameters. These methods account for autocorrelation in different ways and are briefly described below. We focus on statistical methods that have been more commonly used (Ordinary Least Square (OLS), Generalised Least Squares (GLS), Newey-West (NW), Autoregressive Integrated Moving Average (ARIMA)) [2,3,4, 6]. In addition, we have included Restricted Maximum Likelihood (REML) (with and without the Satterthwaite adjustment), which although is not a method in common use, is included because of its potential for reduced bias in the estimation of the autocorrelation parameter, as has been discussed for general (non-interrupted) time series [20]. Further details and equations can be found in Additional file 1.

Ordinary least squares

Estimates of the regression parameters and their variances from model [1] can be obtained from fitting a segmented linear regression model using OLS (Additional file 1). In the presence of autocorrelation, the OLS estimators for the regression parameters are unbiased; however, the SEs will be incorrect [21].

Newey-West

The NW estimator of the variance of the regression parameters estimated using OLS accommodates autocorrelation and heteroskedasticity of the error terms in the regression model (1) [22] (Additional file 1).

Generalised least squares

Two common GLS methods for estimating the regression parameters and their variances are Cochrane-Orcutt (CO) and Prais-Winsten (PW). For both methods, a regression model is first fitted using OLS and an estimate of the autocorrelation is calculated from the residuals. This estimate is then used to transform the data and remove the autocorrelation from the errors, upon which the regression parameters are then estimated from the transformed data. If there is still some residual autocorrelation these steps are iterated until a criterion is met (e.g., the estimated value for autocorrelation has converged [23]). The CO method applies the transformation from the second observation onwards (t = 2, 3, … n). The PW method is a modification to the CO method in which a transformed value is used for the first observation (Additional file 1). The PW method is therefore likely to be more efficient in small series since it does not discard the first observation. The sampling properties of the estimators of the regression parameters are likely to be adversely affected when the series length is small due to poor estimation of the autocorrelation.

Restricted maximum likelihood

It is well known that maximum likelihood estimators of variance components are biased in small samples due to not accounting for the degrees of freedom (d.f.) used when estimating the fixed effect regression parameters [24]. Restricted maximum likelihood is a variant of maximum likelihood estimation and attempts to address the bias by separating the log-likelihood into two terms; one that involves the mean and variance parameters, and one which is only dependent on the variance parameters. By maximising the latter term first with the appropriate number of d.f., an estimate of the variance parameter can be obtained which can be used when maximising the former, thus correctly accounting for the d.f. [20, 25].

For small samples, there is greater uncertainty in the estimation of the SE of the regression parameters. To account for this uncertainty in making inferences about the regression parameters, the Satterthwaite adjustment can be used to adjust the t-distribution d.f. used in hypothesis testing and calculation of confidence limits [26].

ARIMA/ARMAX regression with autoregressive errors estimated using maximum likelihood

A more flexible approach than the Prais-Winsten and Cochrane-Orcutt generalised least squares methods is called Autoregressive Moving Average eXogeneous modelling (ARMAX) [27]. Here we consider a simple case in which the exogenous variables are the functions of time to form a segmented regression model, and the errors are assumed to have an AR (1) structure. Parameters in this more general family of models are estimated by maximum likelihood, enabling the uncertainty in the autocorrelation estimate to be taken into account in the standard error of the regression coefficients, unlike PW or CO. This approach has been variously labelled in the literature, including use of the terminology ‘maximum likelihood ARIMA’ [14]. We therefore use the shorthand term “ARIMA” for consistency with previous literature, including in our companion paper [28]. Further details about the method can be found in Additional file 1, Paolella [27], Nelson [29] and Box et al. [30].

Durbin-Watson test for autocorrelation

The Durbin-Watson (DW) test is commonly used for detecting lag-1 autocorrelation in time series. Often, the test is used as part of a two-stage analysis strategy to determine whether to use a method that adjusts for autocorrelation or use OLS (which does not adjust for autocorrelation). The null hypothesis is that there is no autocorrelation (H0 : ρ = 0) against the alternative that autocorrelation is present (H1 : ρ ≠ 0). The DW-statistic can range between zero and four, with values close to two indicating no autocorrelation. The DW-statistic is compared to critical values to determine whether there is evidence of autocorrelation, no autocorrelation, or the test is inconclusive. The critical values differ by series length, significance level and the d.f. in the regression model. Further details are available in Additional file 1, Kutner et al. [21] and Durbin and Watson [31].

Simulation study methods

We undertook a numerical simulation study, examining the performance of a set of statistical methods under a range of scenarios which included continuous data with different level and slope changes, varying lengths of series and magnitudes of lag-1 autocorrelation. Design parameter values were combined using a fully factorial design with 10,000 data sets generated per combination. A range of criteria were used to evaluate the performance of the statistical methods. We now describe the methods of the simulation study using the ADEMP (defining aims, data-generating mechanisms, estimands, methods and performance measures) structure [32].

Data generating mechanisms

We simulated continuous data from ITS studies by randomly sampling from a parametric model (Eq. 1), with a single interruption at the midpoint, and first order autoregressive errors (examples shown in Supplementary 1.1). We multiplied the first error term, ε1, by $ \sqrt{\frac{1}{1-{\rho}^2}} $ so that the variance of the error term was constant at all time points.

We created a range of simulation scenarios including different values of the model parameters and different numbers of data points per series (Table 1). These values were informed by our review of ITS studies [4], where we reanalysed available data sets to estimate level and slope changes (standardised by the residual standard deviation), and autocorrelation. We found a median standardised level change of 1.5 (inter-quartile range (IQR): 0.6 to 3.0), n = 190), median standardised slope change of 0.13 (IQR: 0.06 to 0.27, n = 190) and median autocorrelation 0.2 (IQR: 0 to 0.6, n = 180). We therefore constructed models with level changes (β2) of 0, 0.5, 1 and 2, and slope changes (β3) of 0 and 0.1. We did not examine negative level or slope changes since we did not expect this to influence the performance metrics. Lag-1 autocorrelation was varied between 0 and 0.8 in increments of 0.2 to cover the full range of autocorrelations observed in the ITS studies included in the review. The number of data points per series was varied from 6 to 100, equally divided before and after the interruption, informed by the number of data points observed in the ITS studies (median 48, IQR: 30 to 100, n = 230). The increment between the number of data points per series varied; initially it was small (i.e. 2) so as to detect changes in the performance metrics that were expected to arise with smaller sample sizes and was increased to 4 and then 20.

Table 1 Simulation parametersFull size table

All combinations of the factors in Table 1 were simulated, leading to 800 different simulation scenarios (Table 1, Fig. 2).

Fig. 2

Structure of the eight models constructed from different combinations of the model input parameters (Table 1)

Full size imageEstimands and other targets

The primary estimands of the simulation study are the parameters of the model, β2 (level change) and β3 (slope change) (Eq. 1). These were chosen as they are commonly reported effect measures [4, 6]. We also examined the autocorrelation coefficient, ρ, and the value of the Durbin Watson statistic.

Statistical methods to analyse ITS studies

Segmented linear regression models were fitted using the estimation methods described in Section 2.2. We evaluated estimation methods designed to estimate the model parameters under lag-1 autocorrelation (see Table 2 for details). For GLS, we restricted our investigation to the PW method, because it was expected to have better performance than the CO method (on which PW is based) given the PW method utilises all data points. For REML with the Satterthwaite adjustment, we substituted d.f. of 2 when the computed d.f. were less than 2, to avoid overly conservative confidence limits and hypothesis tests. We also investigated the commonly used Durbin-Watson method for detecting autocorrelation at a significance level of 0.05 [31].

Table 2 Statistical methods and adjustments for autocorrelationFull size table

Table 2 summarises the methods and model variations used to adjust for autocorrelation. Details of the Stata code used for generating the simulated data and the analysis code can be found in the online repository figshare [33].

Performance measures

The performance of the methods was evaluated by examining bias, empirical SE, model-based SE, 95% confidence interval coverage and power (see Additional file 1 for formulae). Confidence intervals were calculated using the simsum package [34] with t-distribution critical values. For each simulation scenario, we used 10,000 repetitions in order to keep the Monte Carlo Standard Error (MCSE) below 0.5% for all potential values of coverage and type I error rate. Model non-convergence was recorded and tabulated.

Coding and execution

The statistical software Stata version 15 [35] was used for the generation of the simulated data. A random seed was set at the beginning of the process and the individual random state was recorded for each repetition of the simulated data sets. Each dataset was independently simulated, using consecutive randomly generated numbers from the starting seed. We used a “burn in” period between each dataset of 300 random number generations so that any lag effects specific to the computer-generated series had time to dissipate [11].

Prior to running the simulations, we undertook initial checks to confirm that the data generation mechanism was working as expected. This involved fitting series of length 100,000 to check the estimated β parameters matched the input parameters. A larger sample of 1000 datasets was then simulated and checked using summary statistics and graphs. When we were satisfied that the simulations were operating as expected, the full number of datasets were simulated.

Analysis of the simulated datasets

Analyses were performed using Stata version 15 [35]. A range of visual displays were constructed to compare the performance of the statistical methods. Frequency distributions were plotted to visualise the level- and slope-change estimates, autocorrelation coefficient estimates, and the results of the Durbin-Watson test for autocorrelation. Scatter plots were used to display the mean values for empirical and model-based SEs, coverage, power and autocorrelation coefficient estimates. Line plots were used to show confidence intervals for the level and slope change estimates. Results and summaries of the analyses were summarised (using the simsum package [34]) and graphed using Stata version 15 [35].

【本文地址】

Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study

Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study

今日新闻

推荐新闻